1 Foreword

The following document is an assignment for Poznan University of Technology’s Data Visualization course. The course is conducted by Dariusz Brzeziński during the 4th semester of Artificial Intelligence Bachelor degree.

The assignment is an implementation of the grammar of graphics, intended to create rich visualizations from the data we were provided with. The data consists of two data sets, for both of which we’ve chosen the upcoming visualizations. As it is stated in the assignment description:

  • The data in the Sectors folder present the percentage changes of stock prices and trading volume in selected sectors
    • Some of the data sets also contain information about the media sentiment about the companies
  • The Correlations.csv data set contains correlations between the stock prices of pairs of companies identified by stock symbols (tickers)

For both the following visualizations, we will provide brief descriptions and reasoning behind them.

1.1 Credits

We have to credit the lecturer, Dariusz Brzeziński, for the interactive tables we have used. They were built using the DT package.

The interactive tables should work as intended in html format of the document, however they will not be visible in pdf format. For that reason, a standard head of the data frames are displayed.

You can access the project’s repository on GitHub - https://github.com/bujowskis/put-DV/tree/main/ass-2

2 Stocks by sectors

2.1 The data

For simplicity reasons, we are going to show two out of 8 data sets. One of them will be a representative of the sets with sentiments included, and the other one with sentiments missing.

2.1.1 Sentiment included

##    X Symbol                 Name   Volume X1dC.  X1dV.   Open   High  Close
## 1  7   ANTM               Anthem   927200  1.44  -3.25 461.80 465.03 464.86
## 2 16   CTLT             Catalent   800600  3.90 -41.72 128.03 128.26 124.49
## 3 53    SYK  Stryker Corporation  1169700 -3.48 -69.48 267.41 268.87 268.42
## 4 52    STE               Steris   351600 -1.07 -36.01 243.11 243.76 242.57
## 5 59   VTRS              Viatris 11959600 -1.13  -4.98  13.60  14.30  14.21
## 6 41    MCK McKesson Corporation   643400  0.04  -3.25 247.54 248.44 248.10
##   Volume.1 Sentiment
## 1   927200      0.68
## 2   800600      0.66
## 3  1169700      0.63
## 4   351600      0.62
## 5 11959600      0.60
## 6   643400      0.56

2.1.2 Sentiment missing

##    X Symbol                                Name    Volume X1dC.  X1dV.  Open
## 1  5   BBBY              Bed Bath & Beyond Inc. 105519200 -5.30  82.21 30.00
## 2 29      F                  Ford Motor Company  87711400 -0.38 -13.84 16.84
## 3 14    CCL          Carnival Corporation & plc  67608300 -2.25  -0.28 17.30
## 4 60   NCLH Norwegian Cruise Line Holdings Ltd.  39182900 -3.77   3.61 17.17
## 5 49   LCID                   Lucid Group, Inc.  35016600 -4.62   4.76 22.94
## 6 21   DKNG                     DraftKings Inc.  29355000  3.71  -6.35 20.58
##    High Close Sentiment
## 1 30.06 21.71        NA
## 2 16.90 15.97        NA
## 3 17.48 15.53        NA
## 4 17.38 15.38        NA
## 5 24.41 23.17        NA
## 6 20.89 18.05        NA

2.2 Sketch

TODO

2.3 The visualization

# So the idea here is to show the change in Closing price of all the stocks of particular sector in one plot
# for that we decided to use Treemap and color of treemap shows the chnage in close price

library(treemap)
# Here we are going to show one from including Sentiment and without sentiment
energy = read.csv("Dataset/Sectors/energy.csv")
IT = read.csv("Dataset/Sectors/it.csv")
treemap(energy,index=c("Symbol"),vSize = "Volume", vColor = "X1dC.",type="value",border.col = "black",
        border.lwds = 1,title = "Energy Sector",title.legend = "Change in Close price in %")

treemap(IT,index=c("Symbol"),vSize = "Volume", vColor = "Sentiment",type="value",border.col = "black",
        border.lwds = 1,title = "IT Sector",title.legend = "Sentiment Score")

3 Correlations

3.1 The data

##   Ticker.1 Ticker.2 Correlation.Value
## 1       GS      JPM         0.7955952
## 2     AAPL     MSFT         0.7069591
## 3      AXP      JPM         0.6833357
## 4       KO       PG         0.6553540
## 5      CRM     MSFT         0.6464821
## 6      HON      MMM         0.6289362

3.2 Sketch

Static visualization choice for the correlations was pretty obvious from the beginning - a heat map correlation matrix. For that reason, there was really no sketch here.

Regarding handling situations in which there is some correlation value missing, it sufficed to use NA value, which would result in a missing tile in the visualization.

However, there was no such situations in this case, and thus this feature cannot be seen.

3.3 The visualization

library(ggplot2)
library(plotly)

# get all unique tickers
ut <- data.frame(tickers=union(cor_data$Ticker.1, cor_data$Ticker.2))
rut <- data.frame(tickers=rev(ut$tickers))  # save a reversed copy for later

# create dataframe of all combinations
df <- expand.grid(ticker1=rut$tickers, ticker2=ut$tickers)

# read the correlation values
df$val <- NA # correlation not specified, cell will be colored black
for (i in 1:nrow(cor_data)) {
  # read from the dataset
  df$val[length(ut$tickers)*(match(cor_data$Ticker.1[i], ut$tickers) - 1) +
               match(cor_data$Ticker.2[i], rut$tickers)] = cor_data$Correlation.Value[i]
  # it's bidirectional
  df$val[length(ut$tickers)*(match(cor_data$Ticker.2[i], ut$tickers) - 1) +
               match(cor_data$Ticker.1[i], rut$tickers)] = cor_data$Correlation.Value[i]
}
j = length(ut$tickers)
for (i in 0:(length(ut$tickers) - 1)) {
  # remove upper triangle
  for (k in 0:i) {
    df$val[j - k] = NA
  }
  j = j + length(ut$tickers)
}
for (i in 0:(length(ut$tickers) - 1)) {
  # correlation = 1 between the same stock
  df$val[length(ut$tickers) + i*(length(ut$tickers) - 1)] = 1
}

# text for tooltip
df <- df %>%
  mutate(text = paste0(df$ticker1, "\n", df$ticker2, "\n", "Val: ", df$val))

# Heatmap 
p = ggplot(df, aes(ticker1, ticker2, fill=val)) + 
  geom_tile() +
  geom_text(aes(label=round(val, 2)),
            size=6
            ) +
  #scale_x_discrete(guide=guide_axis(n.dodge=2)) +
  theme(axis.title.x=element_blank(),  # remove x axis title
        axis.title.y=element_blank(),  # remove y axis title,
        text=element_text(size=20),
        axis.text=element_text(size=20),
        legend.key.size = unit(2, 'cm'),
        legend.key.height = unit(2, 'cm'),
        legend.key.width = unit(2, 'cm'),
        axis.text.x=element_text(angle=45, hjust=1)
        ) +
  scale_fill_gradient2(low="white", high="blue",
                       limits=c(c(0, 1)),
                       na.value="white"
                       ) +
  ggtitle("Stocks correlation matrix")
p